基于概念的解释性方法旨在使用一组预定义的语义概念来解释深度神经网络模型的预测。这些方法在新的“探针”数据集上评估了训练有素的模型,并将模型预测与该数据集中标记的视觉概念相关联。尽管他们受欢迎,但他们的局限性并未被文献所理解和阐明。在这项工作中,我们分析了基于概念的解释中的三个常见因素。首先,选择探针数据集对生成的解释有深远的影响。我们的分析表明,不同的探针数据集可能会导致非常不同的解释,并表明这些解释在探针数据集之外不可概括。其次,我们发现探针数据集中的概念通常比他们声称要解释的课程更不太明显,更难学习,这使解释的正确性提出了质疑。我们认为,仅在基于概念的解释中才能使用视觉上的显着概念。最后,尽管现有方法使用了数百甚至数千个概念,但我们的人类研究揭示了32个或更少的概念更严格的上限,除此之外,这些解释实际上不太有用。我们对基于概念的解释性方法的未来发展和分析提出建议。可以在\ url {https://github.com/princetonvisualai/overlookedfactors}找到我们的分析和用户界面的代码。
translated by 谷歌翻译
已知性别偏见存在于大规模的视觉数据集中,并且可以在下游模型中反映甚至扩大。许多先前的作品通常通过尝试从图像中删除性别表达信息来减轻性别偏见。为了理解这些方法的可行性和实用性,我们研究了大规模视觉数据集中存在的$ \ textit {gengender伪像} $。我们将$ \ textit {性别伪像} $定义为与性别相关的视觉提示,专门针对那些由现代图像分类器学习并具有可解释的人类推论的线索。通过我们的分析,我们发现性别伪像在可可和开放型数据集中无处不在,从低级信息(例如,颜色通道的平均值)到图像的高级组成(例如姿势和姿势和姿势,,,,,,,,,地和图像的平均值),无处不在(例如,姿势和姿势,姿势和姿势,,,姿势和姿势,是姿势和姿势,是姿势和姿势,是姿势和姿势的平均值)。人的位置)。鉴于性别文物的流行,我们声称试图从此类数据集中删除性别文物的尝试是不可行的。取而代之的是,责任在于研究人员和从业人员意识到数据集中图像的分布是高度性别的,因此开发了对各组之间这些分配变化的强大方法。
translated by 谷歌翻译
在过去的十年中,深度学习模型在机器学习的不同领域取得了巨大的成功。但是,这些模型的大小和复杂性使它们难以理解。为了使它们更容易解释,最近的一些作品着重于通过人类解剖的语义属性来解释深神网络的部分。但是,仅使用语义属性完全解释复杂的模型可能是不可能的。在这项工作中,我们建议使用一小部分无法解释的功能来增强这些属性。具体而言,我们开发了一个新颖的解释框架(通过标记和未标记分解的解释),将模型的预测分解为两个部分:一个可以通过语义属性的线性组合来解释,而另一部分则取决于未解释的功能。 。通过识别后者,我们能够分析模型的“无法解释的”部分,从而了解模型使用的信息。我们表明,一组未标记的功能可以推广到具有相同功能空间的多种型号,并将我们的作品与两种流行的面向属性的方法,可解释的基础分解和概念瓶颈进行比较,并讨论Elude提供的其他见解。
translated by 谷歌翻译
由于机器学习越来越多地应用于高冲击,高风险域,因此有许多新方法旨在使AI模型更具人类解释。尽管最近的可解释性工作增长,但缺乏对所提出的技术的系统评价。在这项工作中,我们提出了一种新的人类评估框架蜂巢(可视化解释的人类可解释性),用于计算机愿景中的不同解释性方法;据我们所知,这是它的第一个工作。我们认为,人类研究应该是正确评估方法对人类用户的可解释方式的金标。虽然由于与成本,研究设计和跨方法比较相关的挑战,我们常常避免人类研究,但我们描述了我们的框架如何减轻这些问题并进行IRB批准的四种方法,这些方法是代表解释性的多样性:GradCam,Bagnet ,protopnet和prodotree。我们的结果表明,解释(无论它们是否实际正确)发芽人类信任,但用户对用户不够明确,以区分正确和不正确的预测。最后,我们还开展框架以实现未来的研究,并鼓励更多以人以人为本的解释方法。
translated by 谷歌翻译
本文介绍了代数单词问题评分释义的新任务(AWP),并提出了一种自我监督的方法。在当前的在线教学环境中,释义这些问题对于院士来说有助于产生多种句法的问题以进行评估。它还有助于引起变化,以确保学生已经理解问题,而不仅仅是记住问题或使用不公平的手段来解决问题。当前的最新释义生成模型通常无法有效地解释单词问题,失去关键信息(例如数字或单位),这使问题无法解决。在AWP的背景下,需要释义方法来训练良好的释义者。因此,我们提出了使用新型数据增强的一种自我监督的解释质量检测方法ParaqD,可以学习潜在表示,以通过广泛的利润将代数问题与贫穷的问题分开。通过广泛的实验,我们证明我们的方法的表现优于现有的最先进的自我监管方法,高达32%,同时也证明了令人印象深刻的零拍性能。
translated by 谷歌翻译
In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.
translated by 谷歌翻译
Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.
translated by 谷歌翻译
Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.
translated by 谷歌翻译
The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.
translated by 谷歌翻译